222 research outputs found

    A novel framework for assessing metadata quality in epidemiological and public health research settings

    Get PDF
    Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly

    Can primary care electronic health records facilitate the prediction of early cognitive decline associated with dementia: a systematic literature review

    Get PDF
    Introduction Identifying the early stages of dementia is key in care management, clinical trial recruitment and mitigating the impact of cognitive impairment. At present, cognitive tests are most commonly used to investigate early stages of dementia and are often only conducted after initial symptoms of cognitive decline have been identified. There is potential to harness routinely collected data from electronic health records (EHR) to discover markers of early-stage dementia, both in its cognitive and non-cognitive manifestations. However, the extent to which primary care EHR can facilitate earlier diagnosis of dementia has not systematically been examined. We aim to determine the extent to which EHR can be utilized to identify prodromal dementia in primary care settings through a systematic review of the literature. Method We searched electronic medical databases (including Scopus, Web of Science, OvidSP, MEDLINE and PsychINFO) for potentially relevant studies up to and including September 2016 and written in English. We used the following MeSH search terms: “dementia” (including its subtypes), “electronic health records” (variations thereof) and “primary care”. Additionally, grey literature was searched including reports released by the government, councils and relevant major UK charities. Results We identified and reviewed 31 studies. In total 35 risk factors and 147 potential markers of early cognitive decline were identified. There was considerable variability across studies as to whether markers were classed as confounders, risk factors, early markers or co-morbidities. Markers predominantly fell within cognitive, affective, motor and autonomic symptoms, prescription patterns of both dementia and non-dementia medication and health system utilization, including type of consultation, frequency of contact and duration. Three studies investigated variation in the markers’ predictive strengths at different time points during the prodromal period of dementia. In the 24 months prior to diagnosis of dementia, gait disturbances, changes in weight, number of consultations, specialty referrals and hospital admissions showed the strongest strength of association with dementia diagnosis. Number of consultations, unpredictability in consulting patterns, such as “Did not attend”, carer and social care involvement showed the strongest strength of association with dementia diagnosis during a longer prodromal period (up to 54 months). Discussion Tests which specifically investigate cognitive health, such as the Mini Mental State Exam (MMSE) exam, are often only conducted in the period of Mild Cognitive Impairment (MCI) preceding dementia diagnosis, once irremediable damage has occurred. In many cases, these symptoms are conflated with normal ageing, affective disorders, or attenuated by multimorbidities, and are therefore not directly linked to dementia. These results show that there is a broad range of potential markers which could be used to better define prodromal dementia, however very little literature has been published in this area. Conclusion There is significant potential to use routinely collected data from EHR to investigate and define prodromal dementia. The use of EHR allows us to obtain a more complete understanding of early-stage dementia according to its more commonly investigated cognitive signs, as well as non-cognitive presentations. Understanding the breadth and trajectories in prodromal dementia period will be key in facilitating earlier diagnosis

    Classification of atherothrombotic events in myocardial infarctions survivors with supervised machine learning using data from an electronic health record system

    Get PDF
    The aim was to build a prediction model for subsequent atherothrombotic events for patients who survived a myocardial infarction. The dataset contained 7,582 patients from a national Electronic Health Record. The prediction is a binary outcome (event and no event) in a period of five years after a myocardial infarction. Different classifiers were tested and XGBoost achieved the best F1-score=0.76. Top features are: imd_score, age_at_entry, egfr_ckdepi_base, height, and SBP_base

    Discovering and validating disease subtypes for heart failure using unsupervised machine learning methods

    Get PDF
    Notable heterogeneity exists in the clinical presentation of heart failure (HF) patients. Current subtype classifications are based on ejection fraction may not fully capture the aetiological and prognostic heterogeneity of HF. The use of unsupervised machine learning (ML) approaches, such as cluster analysis, on large-scale observational data from electronic health records (EHR), can enable the discovery of novel subtypes and guide the characterization of their clinical manifestation. Clustering methods can group HF patients based on similarities between their clinical features without making a priori assumptions about the distribution of the data. We sought to discover, characterize and replicate HF subtypes by applying a clustering method on a heterogeneous HF population derived from phenotypically rich EHR. Characterization of HF subtypes using EHR derived variable may enable more precise large-scale genomic analysis to inform better prevention, diagnostic and treatment strategies

    Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning

    Get PDF
    BACKGROUND: Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS: We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS: We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION: Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR

    Neutrophil Counts and Initial Presentation of 12 Cardiovascular Diseases: A CALIBER Cohort Study

    Get PDF
    BACKGROUND: Neutrophil counts are a ubiquitous measure of inflammation, but previous studies on their association with cardiovascular disease (CVD) were limited by small numbers of patients or a narrow range of endpoints. OBJECTIVES: This study investigated associations of clinically recorded neutrophil counts with initial presentation for a range of CVDs. METHODS: We used linked primary care, hospitalization, disease registry, and mortality data in England. We included people 30 years or older with complete blood counts performed in usual clinical care and no history of CVD. We used Cox models to estimate cause-specific hazard ratios (HRs) for 12 CVDs, adjusted for cardiovascular risk factors and acute conditions affecting neutrophil counts (such as infections and cancer). RESULTS: Among 775,231 individuals in the cohort, 154,179 had complete blood counts performed under acute conditions and 621,052 when they were stable. Over a median 3.8 years of follow-up, 55,004 individuals developed CVD. Adjusted HRs comparing neutrophil counts 6 to 7 versus 2 to 3 × 10(9)/l (both within the 'normal' range) showed strong associations with heart failure (HR: 2.04; 95% confidence interval [CI]: 1.82 to 2.29), peripheral arterial disease (HR: 1.95; 95% CI: 1.72 to 2.21), unheralded coronary death (HR: 1.78; 95% CI: 1.51 to 2.10), abdominal aortic aneurysm (HR: 1.72; 95% CI: 1.34 to 2.21), and nonfatal myocardial infarction (HR: 1.58; 95% CI: 1.42 to 1.76). These associations were linear, with greater risk even among individuals with neutrophil counts of 3 to 4 versus 2 to 3 × 10(9)/l. There was a weak association with ischemic stroke (HR: 1.36; 95% CI: 1.17 to 1.57), but no association with stable angina or intracerebral hemorrhage. CONCLUSIONS: Neutrophil counts were strongly associated with the incidence of some CVDs, but not others, even within the normal range, consistent with underlying disease mechanisms differing across CVDs. (White Blood Cell Counts and Onset of Cardiovascular Diseases: a CALIBER Study [CALIBER]; NCT02014610)

    White cell count in the normal range and short-term and long-term mortality: international comparisons of electronic health record cohorts in England and New Zealand

    Get PDF
    OBJECTIVES: Electronic health records offer the opportunity to discover new clinical implications for established blood tests, but international comparisons have been lacking. We tested the association of total white cell count (WBC) with all-cause mortality in England and New Zealand. SETTING: Primary care practices in England (ClinicAl research using LInked Bespoke studies and Electronic health Records (CALIBER)) and New Zealand (PREDICT). DESIGN: Analysis of linked electronic health record data sets: CALIBER (primary care, hospitalisation, mortality and acute coronary syndrome registry) and PREDICT (cardiovascular risk assessments in primary care, hospitalisations, mortality, dispensed medication and laboratory results). PARTICIPANTS: People aged 30-75 years with no prior cardiovascular disease (CALIBER: N=686 475, 92.0% white; PREDICT: N=194 513, 53.5% European, 14.7% Pacific, 13.4% Maori), followed until death, transfer out of practice (in CALIBER) or study end. PRIMARY OUTCOME MEASURE: HRs for mortality were estimated using Cox models adjusted for age, sex, smoking, diabetes, systolic blood pressure, ethnicity and total:high-density lipoprotein (HDL) cholesterol ratio. RESULTS: We found 'J'-shaped associations between WBC and mortality; the second quintile was associated with lowest risk in both cohorts. High WBC within the reference range (8.65-10.05×10(9)/L) was associated with significantly increased mortality compared to the middle quintile (6.25-7.25×10(9)/L); adjusted HR 1.51 (95% CI 1.43 to 1.59) in CALIBER and 1.33 (95% CI 1.06 to 1.65) in PREDICT. WBC outside the reference range was associated with even greater mortality. The association was stronger over the first 6 months of follow-up, but similar across ethnic groups. CONCLUSIONS: Clinically recorded WBC within the range considered 'normal' is associated with mortality in ethnically different populations from two countries, particularly within the first 6 months. Large-scale international comparisons of electronic health record cohorts might yield new insights from widely performed clinical tests. TRIAL REGISTRATION NUMBER: NCT02014610

    Low eosinophil and low lymphocyte counts and the incidence of 12 cardiovascular diseases: a CALIBER cohort study

    Get PDF
    BACKGROUND: Eosinophil and lymphocyte counts are commonly performed in clinical practice. Previous studies provide conflicting evidence of association with cardiovascular diseases. METHODS: We used linked primary care, hospitalisation, disease registry and mortality data in England (the CALIBER (CArdiovascular disease research using LInked Bespoke studies and Electronic health Records) programme). We included people aged 30 or older without cardiovascular disease at baseline, and used Cox models to estimate cause-specific HRs for the association of eosinophil or lymphocyte counts with the first occurrence of cardiovascular disease. RESULTS: The cohort comprised 775 231 individuals, of whom 55 004 presented with cardiovascular disease over median follow-up 3.8 years. Over the first 6 months, there was a strong association of low eosinophil counts (<0.05 compared with 0.15-0.25×10(9)/L) with heart failure (adjusted HR 2.05; 95% CI 1.72 to 2.43), unheralded coronary death (HR 1.94, 95% CI 1.40 to 2.69), ventricular arrhythmia/sudden cardiac death and subarachnoid haemorrhage, but not angina, non-fatal myocardial infarction, transient ischaemic attack, ischaemic stroke, haemorrhagic stroke, subarachnoid haemorrhage or abdominal aortic aneurysm. Low eosinophil count was inversely associated with peripheral arterial disease (HR 0.63, 95% CI 0.44 to 0.89). There were similar associations with low lymphocyte counts (<1.45 vs 1.85-2.15×10(9)/L); adjusted HR over the first 6 months for heart failure was 2.25 (95% CI 1.90 to 2.67). Associations beyond the first 6 months were weaker. CONCLUSIONS: Low eosinophil counts and low lymphocyte counts in the general population are associated with increased short-term incidence of heart failure and coronary death. TRIAL REGISTRATION NUMBER: NCT02014610; results

    Analyzing the heterogeneity of rule-based EHR phenotyping algorithms in CALIBER and the UK Biobank

    Get PDF
    Electronic Health Records (EHR) are data generated during routine interactions across healthcare settings and contain rich, longitudinal information on diagnoses, symptoms, medications, investigations and tests. A primary use-case for EHR is the creation of phenotyping algorithms used to identify disease status, onset and progression or extraction of information on risk factors or biomarkers. Phenotyping however is challenging since EHR are collected for different purposes, have variable data quality and often require significant harmonization. While considerable effort goes into the phenotyping process, no consistent methodology for representing algorithms exists in the UK. Creating a national repository of curated algorithms can potentially enable algorithm dissemination and reuse by the wider community. A critical first step is the creation of a robust minimum information standard for phenotyping algorithm components (metadata, implementation logic, validation evidence) which involves identifying and reviewing the complexity and heterogeneity of current UK EHR algorithms. In this study, we analyzed all available EHR phenotyping algorithms (n=70) from two large-scale contemporary EHR resources in the UK (CALIBER and UK Biobank). We documented EHR sources, controlled clinical terminologies, evidence of algorithm validation, representation and implementation logic patterns. Understanding the heterogeneity of UK EHR algorithms and identifying common implementation patterns will facilitate the design of a minimum information standard for representing and curating algorithms nationally and internationally

    Selective recruitment designs for improving observational studies using electronic health records

    Get PDF
    Large‐scale electronic health records (EHRs) present an opportunity to quickly identify suitable individuals in order to directly invite them to participate in an observational study. EHRs can contain data from millions of individuals, raising the question of how to optimally select a cohort of size n from a larger pool of size N . In this article, we propose a simple selective recruitment protocol that selects a cohort in which covariates of interest tend to have a uniform distribution. We show that selectively recruited cohorts potentially offer greater statistical power and more accurate parameter estimates than randomly selected cohorts. Our protocol can be applied to studies with multiple categorical and continuous covariates. We apply our protocol to a numerically simulated prospective observational study using an EHR database of stable acute coronary disease patients from 82 089 individuals in the U.K. Selective recruitment designs require a smaller sample size, leading to more efficient and cost‐effective studies
    • 

    corecore